Skip to content

Major refactor to move multi-threaded load generator to async event loops#282

Merged
nv-alicheng merged 4 commits intomainfrom
feat/alicheng-pubsub-integration
Apr 22, 2026
Merged

Major refactor to move multi-threaded load generator to async event loops#282
nv-alicheng merged 4 commits intomainfrom
feat/alicheng-pubsub-integration

Conversation

@nv-alicheng
Copy link
Copy Markdown
Collaborator

What does this PR do?

Major refactor which moves the multi-threaded system to an event loop based one to reduced Python context switching for higher throughput and performance.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

This merges from a side branch which consists of multiple PRs:

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@nv-alicheng nv-alicheng requested a review from a team April 14, 2026 02:07
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 14, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested review from arekay-nv and nvzhihanj April 14, 2026 02:07
Comment thread src/inference_endpoint/async_utils/services/metrics_aggregator/aggregator.py Dismissed
Comment thread tests/unit/load_generator/test_async_session.py Fixed
Comment thread src/inference_endpoint/load_generator/session.py Fixed
Comment thread src/inference_endpoint/load_generator/session.py Fixed
Comment thread src/inference_endpoint/load_generator/session.py Fixed
Comment thread src/inference_endpoint/load_generator/session.py Fixed
Comment thread src/inference_endpoint/load_generator/strategy.py Dismissed
Comment thread src/inference_endpoint/load_generator/strategy.py Dismissed
Comment thread src/inference_endpoint/load_generator/strategy.py Dismissed
Comment thread tests/unit/load_generator/test_async_session.py Fixed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the load generator to an asynchronous architecture and replaces the SQLite-based metrics pipeline with a high-performance, mmap-backed key-value store. It introduces a new BenchmarkSession for phase orchestration and optimized async strategies for Poisson, burst, and concurrency load patterns. The review feedback identifies a potential race condition caused by premature temporary directory deletion in the scoring artifact logic, suggests using generic placeholders for endpoint URLs to improve configuration portability, and recommends dynamically registering metric keys using enums to enhance code maintainability and prevent silent failures.

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
Comment thread configs/gptoss_test.yaml Outdated
Comment thread src/inference_endpoint/commands/benchmark/execute.py
Comment thread src/inference_endpoint/load_generator/session.py Fixed
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass. Will finish up tomorrow.

Comment thread configs/gptoss_test.yaml Outdated
Comment thread tests/unit/async_utils/services/metrics_aggregator/test_kv_store.py
Comment thread tests/unit/async_utils/services/metrics_aggregator/test_kv_store.py
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2 of reviews - next one should wrap it up

Comment thread docs/async_utils/transport/zmq/ready_check_design.md
Comment thread docs/metrics/report_design.md
Comment thread tests/unit/metrics/test_report_builder.py
Comment thread tests/unit/load_generator/test_sample_order.py Outdated
Comment thread tests/unit/load_generator/test_sample_order.py
Comment thread tests/unit/async_utils/services/metrics_aggregator/test_aggregator.py Outdated
Comment thread tests/unit/async_utils/services/metrics_aggregator/conftest.py
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work - thanks.
I will run some perf test offline, but looks good!

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
nv-alicheng added a commit that referenced this pull request Apr 20, 2026
- docs(report_design): add "reports reproducible from event log" principle
- refactor(metrics_table): rename subtract_field -> delta_start_fieldname
- docs(metrics_table): reword ISL token_ids/text comments so SGLang/OpenAI
  are examples, not defining conditions
- test(kv_store): pin empty SeriesStats min/max sentinels; add snapshot
  isolation regression test
- test(aggregator): add explanatory messages to tracking-window asserts
- test(report_builder): pin max/std_dev on empty compute_summary
- test(sample_order): parametrize over [3, 100, 10_000] dataset sizes
- test(zmq_pool_transport): collapse two pool transport classes into one
  parametrized over (num_workers, create_publisher)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nv-alicheng added a commit that referenced this pull request Apr 21, 2026
Addresses PR #282 thread T33: reframe the docstring so MetricsAggregator
and Harmony read as examples rather than a closed list of consumers, and
state explicitly that this function never loads or downloads the tokenizer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nv-alicheng added a commit that referenced this pull request Apr 21, 2026
* chore: address PR review comments — fix tmpfs cleanup, use metric enums, fix templates

- Fix premature tmpfs deletion in _write_scoring_artifacts (cleanup
  now owned solely by run_benchmark's finally block)
- Replace hardcoded metric key strings in _setup_kv_reader with
  MetricCounterKey/MetricSeriesKey enum iteration
- Fix config templates: replace placeholder URLs with http://localhost:8000,
  remove nonexistent record_worker_events field, fix YAML indentation
- Replace internal hostname in gptoss_test.yaml with localhost

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Disable warmup by default until rules are defined

* Remove stray config file

* Add TPOT_NS to _STREAMING_ONLY metrics set

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* chore: address second round of PR #282 review comments

- docs(report_design): add "reports reproducible from event log" principle
- refactor(metrics_table): rename subtract_field -> delta_start_fieldname
- docs(metrics_table): reword ISL token_ids/text comments so SGLang/OpenAI
  are examples, not defining conditions
- test(kv_store): pin empty SeriesStats min/max sentinels; add snapshot
  isolation regression test
- test(aggregator): add explanatory messages to tracking-window asserts
- test(report_builder): pin max/std_dev on empty compute_summary
- test(sample_order): parametrize over [3, 100, 10_000] dataset sizes
- test(zmq_pool_transport): collapse two pool transport classes into one
  parametrized over (num_workers, create_publisher)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: clarify _check_tokenizer_exists is a probe, consumers are examples

Addresses PR #282 thread T33: reframe the docstring so MetricsAggregator
and Harmony read as examples rather than a closed list of consumers, and
state explicitly that this function never loads or downloads the tokenizer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Comment thread tests/unit/load_generator/test_async_session.py Dismissed
Comment thread src/inference_endpoint/load_generator/session.py Dismissed
Comment thread src/inference_endpoint/load_generator/session.py Dismissed
Comment thread src/inference_endpoint/load_generator/session.py Dismissed
Comment thread src/inference_endpoint/load_generator/session.py Dismissed
Comment thread src/inference_endpoint/load_generator/session.py Dismissed
nv-alicheng added a commit that referenced this pull request Apr 21, 2026
Squashed combination of 8 commits developed on feat/alicheng-pubsub-integration:

- Initial cleanup to remove all features that will be replaced and
  permanently removed (#214)
- Add KVStore, ready-check mechanism, and ServiceLauncher (#215)
- Add report building (#216)
- Fix rebase errors
- Make Loadgen Async (#255)
- Do not use /dev/shm tmpfs for KVStore on ARM, fix rebase errors in
  e2e_oracle tests
- Add batched publishing to pubsub (#281)
- Disable warmup temporarily, bugfixes. (#288)

Squashed locally prior to rebase onto origin/main to avoid a
case-insensitive-filesystem conflict on docs/load_generator/DESIGN.md
between intermediate commits. Full per-sub-PR history is preserved in
the review threads on PR #282 and on the backup ref
backup/pre-squash-pubsub-integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nv-alicheng nv-alicheng force-pushed the feat/alicheng-pubsub-integration branch from 4c785b2 to aaf6f9b Compare April 21, 2026 20:15
Comment thread src/inference_endpoint/commands/benchmark/execute.py
Copy link
Copy Markdown
Collaborator

@viraatc viraatc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! thanks!

nv-alicheng and others added 2 commits April 22, 2026 12:50
Squashed combination of 8 commits developed on feat/alicheng-pubsub-integration:

- Initial cleanup to remove all features that will be replaced and
  permanently removed (#214)
- Add KVStore, ready-check mechanism, and ServiceLauncher (#215)
- Add report building (#216)
- Fix rebase errors
- Make Loadgen Async (#255)
- Do not use /dev/shm tmpfs for KVStore on ARM, fix rebase errors in
  e2e_oracle tests
- Add batched publishing to pubsub (#281)
- Disable warmup temporarily, bugfixes. (#288)

Squashed locally prior to rebase onto origin/main to avoid a
case-insensitive-filesystem conflict on docs/load_generator/DESIGN.md
between intermediate commits. Full per-sub-PR history is preserved in
the review threads on PR #282 and on the backup ref
backup/pre-squash-pubsub-integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nv-alicheng nv-alicheng force-pushed the feat/alicheng-pubsub-integration branch from aaf6f9b to d44be10 Compare April 22, 2026 20:02
@nv-alicheng nv-alicheng merged commit 2a60f63 into main Apr 22, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants